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USER-CENTRIC MEASUREMENT OF QUALITY OF SERVICE 
IN A COMPUTER NETWORK 

TECHNICAL FIELD OF THE INVENTION 

[0001] The present invention relates in general to methods and apparatus for the 

monitoring and assessment of computer system performance, and more particularly 

for a system and method for user-centric measurement of quality of service in a 

computer network. 
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BACKGROUND OF THE INVENTION 
[00021 Computer networks of e.g. businesses have several to thousands of users, and 
concomitantly several to thousands of workstations or other information access points 
(lAPs). Information Technology (IT) professionals typically are tasked with 
maintaining and enhancing the operability of each computer on the system. 
Traditionally, network performance has been assessed by measuring resoiwces which 
are common across the network, such as network load, application response time and 
server activity. From these measurements, the service level of end users has been 
inferred. These traditional, server-centric level-of-service measures may still not 
given an accurate pictiu-e of lAP operability as experienced by individual users. 
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SUMMARY OF THE INVENTION 
[00031 The present invention provides methods, apparatus, systems and prerecorded 
media for measuring and assessing the quality of service (QoS) at each of a plurality 
of information access points (lAPs), such as workstations, preferably in a network. 
The QoS at any lAP and for any predetermined period of time is preferably 
represented as a single numeric QoS index, QoS indices are also preferably 
calculated for certain ones of the software applications running on the lAP. Metrics 
for the quality of service for an entire network, or of groups of the lAPs thereon, can 
be calculated as a function of the user-centric QoS indices. The QoS application 
indices can likewise be combined to obtain network- or group-wide measurements of 
^plication performance. 

[00041 According to one aspect of the invention, the QoS index for an lAP is 
calculated by a QoS application or module, preferably resident on the LAP, as a 
function of several factors, the identities and weights of which are set according to an 
operational user (OU) profile. Each user on the network is assigned one of a 
predetermined plurality of OU profiles according to the user's responsibilities in the 
enterprise. 

[0005] In a further aspect of the invention, the QoS module observes the operation of 
the lAP for the occurrence of any of a mrniber of predetermined exceptions to normal 
performance. These exceptions can be of various kinds and severities. For each one 
of several computer fimctions or conditions, performance criteria are estabUshed for 
each of several states. A "green", or normal, state has no criteria associated with it, 
but the rest of the states — indicative of progressively degraded degrees of 
performance - have predetermined stored criteria associated with them. For example, 
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an exception may be declared based on processor usage, memory usage, printer 
access, or input/output rate. An exception will also be declared if an lAP application 
hangs or crashes, the entire LAP hangs or crashes, or network resources become 
unavailable. In a preferred embodiment, different weights are accorded these 
different exceptions according to their identities, the identity of the user associated 
with the lAP, the length of time during which the exception occurred, the time of day 
or day of week during which tfie exception occurred, the severity of the sensed 
problem, and/or the particular appUcafion for which the exception was noted. The 
QoS index or indices is/are calculated as a function of these weighted exceptions. 

[0006] In one embodiment, the QoS module yields a QoS index and state or color 
indicating the relative performance of the lAP. A range of colors may be used, such 
as green (completely operational), yellow (slightly degraded), orange (degraded), red 
(not usable) and black (crashed or frozen). The QoS index and lAP state can be 
monitored over time to determine changes in state and trends in performance. 

[0007] In a still further aspect of the invention, the QoS module includes a user 
attendance probe which senses whether the user is present at the time the exception 
occurred. Exceptions occurring while the user is present are preferably given more 
weight in the QoS index calculation than exceptions occurring when the user is not 
there. 

[0008] In yet another aspect of the invention, termed by the inventors "responsibility 
tracking", the QoS module takes a snapshot of the lAP at the time that the exception 
occurred, recording the identities of the software appUcations, or more particularly 
application processes (some applications can be several processes running in parallel). 



^HGOl:30346766.vl 



4 



Attorney Docket No. 310035-000001 



then running on th lAP, The snapshot and other data concerning the type and 
importance of the exception are recorded in a log. Preferably the contents of the log 
are periodically uploaded to a master (such as may be mounted on a QoS server for a 
network to which the lAP is connected) on a periodic basis. The network 
administrator is able to discern patterns of performance based on which applications 
were running during the periods for which the exceptions were recorded. The QoS 
module also includes logic to determine, by measurement of the relative use of one or 
more system resources (such as memory usage), which of the running software 
applications are the top candidates for having caused the exception. 

[0009] In a still further aspect of the invention, the QoS system is heuristic in that 
software application performance history is used to determine current expected 
performance norms for the applications, and these in turn are used to determine an 
amount of deviation from such nomis fliat will be tolerated before an exception is 
declared. 

[00101 The present invention is technically advantageous in that it takes a user-centric 
approach to measure the performance of day-to-day operations and to evaluate the 
effects of changes to the environment such as workload increases, the addition of new 
applications and modifications to the network. The present invention bases its 
assessment on service criteria from individual user perspectives and uses metrics to 
quantify the service level experienced by the user. The present invention provides a 
heuristic modeling of applications in order to detect variations from the standard. It 
uses the detection of user presence at a workstation to more accurately weight the 
effect of performance variations and introduces the assignment of responsibility for 
performance problems to activities at the workstation. 
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BRIEF DESCRIPTION OF THE DRAWINGS 
[0011] Further aspects of the invention and their advantages can be discerned in the 
following detailed description, in which like characters denote like parts and in which: 

[0012] FIGURE I is a schematic diagram of a computer network employing the 
invention; 

[0013] FIGURE 2 is a high level schematic diagram of a representative workstation 
employing the invention and being a node in the network of FIGURE 1; 

[0014] FIGURE 3 is a schematic diagram of the architecture of the software installed 
on the workstation shown in FIGURE 2; 

[0015] FIGURE 4 is a flow diagram showing operation of the QoS assessment system 
according to the invention; 

[00161 FIGURE 5 is a more detailed block diagram of a QoS module according to the 
invention, shov/ing functional components and data flows to and from the 
components; 

[0017] FIGURE 6 is a detailed block diagram of an application performance meter 
according to the invention; 

[00181 FIGURE 7 is a table showing a representative selection of operational user 
(OU) profiles and their exception weighting factors; 

[00191 FIGURE 8 is a table showing predetermined criteria for different states for a 
range of different computer functions or conditions in a representative operational 
user (OU) profile; 



~CHGOl:303467d6.vl 



6 



Attorn^ Docket No. 310035-000001 

(00201 FIGURE 9 is a flow diagram showing steps in an exception capture process 
according to the invention; 

(00211 FIGURE 10 is a flow diagram showing calculation and uploading of QoS 
indices and related system snapshots; 

(00221 FIGURE 11 is a representative graph of a QoS index for one lAP over a 
number of days; 

(00231 FIGURE 12 is a graph over time showing QoS indices for two groups of lAPs; 

(00241 FIGURE 13 is a graph over time showing, for a single group of lAPs, a 
plurality of QoS application indices, one for each of a preselected set of applications; 

(00251 FIGURE 14 is a three dimensional bar graph showing QoS application indices 
by date and by software application; 

(00261 FIGURE 15 is a representative user presence graph report; 

(00271 FIGURE 16 is a representative application modeling comparison report; 

(00281 FIGURE 17 is a graph using QoS application indices ranking applications 
according to their relative contributions to problems; and 

(00291 FIGURE 18 is a graph showing software applications measured by usage and 
response times. 
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DETAILED DESCRIPTION 
[00301 While the present invention could have utility on a standalone computer, it is 
optimally deployed on workstations and server(s) in a network, a representative one 
of which is indicated generally at 100 in FIGURE 1. The illustrated network 100 is a 
local area network (LAN) although the invention has application to wide area 
networks (WANs) up to and including the Internet, The illustrated network 100 is in 
a star topology, wherein each node connects to a hub, router or switch or combination 
thereof 102, and employs the Ethernet communications protocol. Other topologies 
and other network communication protocols could altematively be adopted for use 
with the invention, such as bus or ring topologies. The Quality of Service assessment 
tool according to the invention is independent of the networking topology (ring, star, 
bus), transmission medium (coax, optical fiber, twisted pair, wireless), access method 
(carrier sense, token passing), protocol (Ethemet, Token Ring, FDDI, ISDN, ATM, 
Frame Relay, ARCnet) or technology (LAN, WAN), and will function equally as well 
on a network having any particular combination of these characteristics. 

[00311 The network includes a plurality of workstations 104, one, more or all of 
which altematively could be more limited sorts of user information access points 
(lAPs) such as desktops, laptops or even PDAs. A representative five workstations 
104 are shown, but a LAN could have as few as two and as many as hundreds or 
thousands. The network 100 will also have at least one server 106 and in many 
instances will have several such servers (examples of which are shown at 106, 108, 
110, 112) that offer various network services and perform diflFerent collective 
functions for network 100, Servers 106 - 112 may be physically distinct or may be 
virtual divisions of fewer than the number of physical units shown. Network 100 can 



~CHGOl:30346766.vl 



8 



Attorney Docket No. 310035-000001 



also have connected to it other network devices, represented here by exemplary 
printer 105. 

[0032] A server 112, for example, may be tasked as an internet communications 
firewall to the Internet 1 14. A server 110 may be used as an email server and to host 
a web site or an entire application specific provider (ASP). Server 106 may include 
an array of hard drives or other mass storage units and may fiimish data backup and 
auxiliary storage capabilities to the workstations 104. Server 106 also may host one 
or more software applications accessible by the users at workstations 104. 

[00331 Important here is server 108, which, among its other possible Amotions, acts as 
a data aggregating and report generating server for the Quality of Service (QoS) 
system provided according to the invention. One of servers 106, 108, 110 could also 
act as a gateway to wireless handheld lAPs, fi-om cell phones to PDAs. Any such 
gateway would be connected to a wireless transmitter (not shown). 

[0034 J A representative physical structure of one of the lAPs or workstations 104 is 
shown in FIGURE 2. The workstation will have at least one CPU or processor 200, 
Workstation 104 could have a second or more processors and may have, in addition to 
representative CPU 200, a specialized coprocessor for mathematics, graphics or the 
like. Current microprocessor architecture places a memory cache 202 on-chip with 
the processor 200 for fast memory access. The CPU 200 is connected via two or 
more buses, here represented schematically by a single bus 204, but actually being 
made up by physical buses of varying speeds and instantiated by a chipset (not 
shown). The bus 204 commimicates with a series of peripheral devices, some of 
which are internal to the workstation 104 and some of which are not. 
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[0035] For example, bus 204 will always communicate with a nonvolatile electronic 
memory unit 206, on which is loaded a basic input/output system or BIOS. Unit 206 
typically is an EEPROM. Workstation 104 will likely have a large random access 
memory (RAM) unit 208 that may be distributed among several cards and/or chips. 

[0036] There must be some method of communicating with a user, so workstation 
104 will typically have a controller 210 which receives inputs from a keyboard 212, a 
mouse 214 and possibly other input devices. Of increasing importance and 
sophistication are the image(s) that are being displayed to the user, and this may be 
done through a PCI controller 216, which itself is in communication with one or more 
graphics cards 218 (each of which has dedicated processor and memory capability), 
each of which in tum controls what appears on at least one monitor 220 that typically 
is external to the CPU enclosure. Other sorts of internal communications protocols / 
controllers may be used, such as an AGP controller 222 for running other displays or 
graphics capabilities and a SCSI controller 224 for operating one or more magnetic 
hard drives 226, optical drives 228 or other peripheral I/O devices (not shown). The 
optical drive(s) 228 may be read or read/write and may include CD, DVD or both 
capabilities. Other kinds of mass storage are possible, such as tape drives, and 
hereinafter the inventors' reference to "disk" should be understood to refer to any 
mass storage unit from and to which data may be read and written. 

[00371 Computers of recent vintage will have typically have many universal serial 
bus (USB) ports 230, a representative one of which is shown here. USB ports 230 
may be used to connect to a keyboard, a mouse (both in replacement of controller 
210), a printer (not shown; increasingly used in substitution for the older-style parallel 
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port), PDAs, scanners, memory sticks and noncomputer components such as high- 
fidelity audio equipment or computer-controllable appliances. 

[00381 Also in increasing use are wireless transmitters 232, such as those employing 
the IEEE 802.11 or IEEE 802.15 "wifi" wireless communications protocols, 
Bluetooth, or other wireless protocols. The illustrated, exemplary workstation 104 
has a network interface card 234 for hardwire connection to the network 100, and in 
the illustrated embodiment is an Ethemet card. Nodes of the network 100 could also 
be coimected to network printers (such as printer 105 in FIGURE 1), scanners, fax 
machines and other network devices. 

[00391 The user-centric quality of service (QoS) module 312 (FIGURE 3) according 
to the invention measures the performance of certain components of the lAP 104 and 
the quality and volume of their communications with other nodes on network 100. 
Among other things, the QoS module, application or utility tracks usage of processor 
200, usage of RAM 208, and input/output rates of data transmission to and firom one 
or more illustrated peripherals, whether inside the workstation enclosure, immediately 
peripheral to it or connected flirough network 100, Traffic to and firom hard drive or 
other mass storage \mit 226 is of particular interest. 

[00401 Illustrated workstation 104 and server 108 are general-purpose hardware 
components tumed into special-purpose computers by executing instructions of 
several software programs loaded onto them. While software-programmable general- 
purpose computers are by far the most common practice as of the time of writing, 
hardwired, special-purpose computers are also a possibility. One or more of the 
functions of a computer 104 may alternatively be implemented by hardwired logic. 
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[00411 The present invention may be supplied in the form of computer programs on 
prerecorded media such as optical disks, electronic memories or magnetic media. 
These programs are read into RAM 208 or other memory and executed by CPU 200 
as a portion of its execution cycle, in effect tuming CPU 200 into a special-purpose 
processor which performs the QoS monitoring and assessment functions and the 
application metering functions described below. Alternatively software embodying 
the present invention may be downloaded from a server (such as server 108) to each 
lAP 104. 

[0042] FIGURE 3 schematically shows the software architecture of the representative 
workstation 104 after booting up. The BIOS 300 is first loaded into RAM 208 upon 
bootup and this is followed by the main operating system (OS) 302. Such operating 
systems can be among the Microsoft Windows® family, the Apple®/MacIntosh® 
family, Unix®, Linux® or others. The operating system 302 at its most basic controls 
the input/output of the processor 200 and RAM 208 to each of the peripherals. As 
newer generations of operating systems have been introduced, they have gotten larger 
and have taken on more functionality. Important here is a typiical current OS's 
responsibility for hard driye or other mass storage I/O 304, USB J/O 306, reads and 
writes from and to optical media 308, a print manager/spooler 310, reads from and 
writes to other objects, such as from and to other devices on the Ethernet network 100 
or web sites on the Internet 114. The OS 302 also manages access to memory, and 
allocates CPU cycles to the varioiis processes running on the LAP 104. 

[0043] On top of the operating system 302 are software applications, some of which 
are meant to run all of the tune and others of which are meant to be loaded only when 
the user selects them. A Quality of Service (QoS) software appUcation or module 312 
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preferably runs all of the time and gathers data on the performance of the OS 302, and 
particularly its management of RAM 208, processor 200 usage, I/O operations and 
other characteristics as will be detailed below. An application meter 313 takes note of 
which other applications, and which processes of those applications (some 
applications may run two or more processes in parallel) are running on the 
workstation 104 and how much of the system resources these applications are using. 
These may include an internet browser 314, such as Netscape® or Microsoft® 
Internet Explorer®; an email/contact application 316, such as Microsoft® Outlook®; 
and a word processor 318, such as Corel® WordPerfect® or Microsoft® Word®. 
Typically running in the background is an Antivirus application 320, such as those 
provided by McAfee® and Norton®. Other user-selectable applications may include 
a spreadsheet application 322, such as Excel® or Quicken®; presentation software 
324, such as Microsoft® Powerpoint®; a document viewer 326 such as Adobe® 
Acrobat®; and a computer assisted design application 328 such as AutoCad® or 
ProEngineer®. There are myriad other possible applications as represented by the 
dotted line. In a network environment, the networic administrator typically will set 
policies concerning the use and presence of ttiese software applications. Network 
administrators are interested in learning how these applications perform in their 
environments, individually and in conjunction with various combinations of other 
software, and the present invention provides a powerfiil diagnostic tool in analyzing 
their performance. 

[0044] The industry has adopted a standard by which operating system 302 makes 
available, through various calls to operating system fimctions, certain metrics or 
variables by which system performance can be tracked. This standard is called Web 
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Based Enterprise Management (WBEM) and Microsoft's version of this standard is 
called "WMI". QoS monitor 312 and application meter 313 take advantage of these 
standards. 

(00451 While in the illustrated embodiment the QoS module 312 and application 
performance meter 313 are shown as software applications separate from the 
operating system 302, in other embodiments they can be integrated into the operating 
system 302. 

[00461 FIGURE 4 is a flowchart of a system overview showing the operation of the 
QoS module and application performance meter according to the invention. Activities 
occurring at the local workstation or lAP 104 are shown above the dotted line. 
Activities occurring at the QoS server 108 are shown below the dotted line. 

[00471 The QoS module 312 at the local lAP 104 monitors quality of TAP operation. 
A user profile 400, supplied to the workstation by the QoS server 108, is used as a 
record or basis against which present lAP performance is measured. lAP exception 
capture component 402 senses when any of a plurality of fimctions or conditions of 
the lAP enters into a degraded state of performance, predetermined criteria for which 
have been storedi in OU profile 400. At that point, an exception is noted and is 
captured. Module 404 includes a quality of service (QoS) index calculator which 
calculates lAP and application QoS indices as normalized over the time periods for 
which these indices are calculated. Data concerning the exception and the QoS 
indices are recorded on local disk 406. 

[00481 The application performance meter 313 meters the performance of each of the 
applications running on the workstation or other lAP. Measurements such as CPU 
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usage, memory usage and I/O rates are captured by application performance meter 
313 and stored at 410 in a local performance matrix. At predetermined intervals, such 
as once a day, the results of the QoS and metering activities of QoS module 312 and 
meter 313 are uploaded onto a database 412 of the QoS server 108, Logic 414 
performs QoS metric calculations and compiles responsibility tracking from the data 
made available by database 412. As will be explained below, responsibility tracking 
identifies, from the applications which were running at the time that an exception 
occurred, likely candidates for exception causation. Preferably, module 414 makes at 
least three kinds of reports available to the system administrator: a QoS index 416, a 
responsibility tracking report 418, and application performance measurements 420. 

[00491 FIGURE 5 illustrates the fimctional components of the Quality of Service 
module 312 in more detail. The QoS 312 module includes an exception capture 
component 402 having as its ftmction the declaration or attribution of an exception C. 
Component 402 does this by periodically monitoring various measiurements of system 
performance. It obtains most of these data by executing calls through the operating 
system 302. The information that it periodically obtains includes network 
availabiUty, log events, printing spooler status and printer availability, CPU usage, 
memory usage, input/output rate, and response time to a user command or input. In 
one embodiment, the recorded I/O rate can simply be the rate of writes to and reads 
from the hard disk(s) 226 of lAP 104; in conventional PCs, most of the I/O activity 
occurs to these devices. A large I/O rate indicates that the executing processes have 
had to use memory resources other than that made available by the system RAM, 
typically a condition which is sought to be avoided. Therefore, a high I/O rate to the 
hard disk(s) is indicative of a degradation in system performance. A related metric is 
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paging. Some paging is necessary because it is often the case that the operating 
system plus the running applications use an amount of memory which exceeds the 
physical memory present. In one embodiment the paging rate is also captured. 

[0050] Exception capture component 402 also accepts inputs from a user exception 
manual indicator 450. In the illustrated embodiment, the operational user can decide 
that the computer operation is in less than optimum condition and signal to the 
network that he or she believes that its performance has been degraded. Through 
indicator 450, the user can write an exception C to the exception capture component 
402, together with an indication of its severity (yellow, orange, red, black). 

[0051] The exception capture component 402 takes these data and compares them 
against respective criteria stored in the OU profile. The OU profile 400 also includes 
baseline performance data 452 for each of the software applications capable of 
running on LAP 104, and exception capture component 402 will declare an exception 
if the perfomiance of any running application deviates too far from baseline values. 
An exception will also be declared if one of the monitored LAP fimctions or 
conditions is found to have entered into any of several states each indicative of 
degraded performance (e.g., yellow, orange, red, black). 

[0052] Once the exception capture component 402 has declared an exception ttiis 
fact is conmiunicated to a snapshot module or component 454. The snapshot 
component 454 uses the instance of an exception as an interrupt and takes a snapshot 
of the applications then running on the lAP 104 and their use of system resources. 
These data are written to local disk 406. 
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[00531 Responsive to receiving an instance of an exception, snapshot component 454 
will also read the status of a user presence probe 456 which runs a clock from the last 
time the user operated a keyboard, mouse or other user input device. The probe 456 
toggles a bit between "unattended" and "attended" status. If the clocked time is 
sufficiently long, the user presence probe 456 times out, decides that the user is not 
present and sets an "unattended" status; if the last recorded time of user input is 
relatively recent, the user presence probe will be in an "attended" status. A new input 
from the keyboard or mouse will reset status from "unattended" to "attended" and 
restart the attendance clock. This presence or absence of the user is communicated as 
one of the factors Fwjj to log 406 and preferably is used in the calculation of the QoS 
index. 

[0054] The record 406 accumulates these exceptions through the day, and organizes 
them into predetermined intervals, such as fifteen-minute intervals. For each such 
interval there is also recorded the presence or absence of a user and flie time of day. 
Database 406 also records certain circumstances relating to the exception, such as the 
type of condition, function and/or application triggering the exception. As will be 
detailed below, tiiese circumstances are used to weight the importance of each 
exception in the calculation of a quality of service (QoS) index by QoS calculator 
460. QoS indices for both the lAP as a whole and certain ones of the applications 
running on the lAP are likewise written to the local disk 406. 

(00551 At predetermined intervals, such as once a day, the contents of the local QoS 
database 406 are uploaded to the network QoS database 412. The server 108, based 
on application performance data (see FIGURE 6), determines a normal operations 
baseline and downloads the parameters of this to the local lAP 104 and more 
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particularly incorporates this baseline into OU profile 400. FIGURE 6 provides more 
detail on application performance meter 313. At predetermined intervals, such as 
every fifteen minutes, it gathers data on each software application then running on 
lAP 104. In the illustrated embodiment, these data include application use of CPU 
cycles and memory, and numbers of page faults, handles and threads. In a simplified 
embodiment the numbers of page faults, handles and threads are not recorded and the 
invention instead relies on its measurement of memory use alone rather than in 
combination with these other parameters, which have a high correlation to an amount 
of memory use. Meter 313 also records the application's response to user input, 
network traffic and disk traffic. These data are preferably obtained through operating 
system 302. The resultant data are written to a performance matrix 410. At 
predetermined intervals, such as once a day, the performance matrix 410 is uploaded 
to QoS server 108 and database 412. The performance data are used to derive 
baseline application performance criteria 452, which are periodically downloaded to 
each lAP 104. 

[00561 The operational user profile 400 is one of a set of such profiles stored on QoS 
server 108. Representative profiles are illustrated in the table shown in FIGURE 7, 
'Each lAP user or "operational user" is assigned to one of the profiles stored on server 
108. At least one selected profile is installed locally on each lAP or workstation 104 
as it is added to the system. If an lAP 104 has two or more users of different types, 
two or more profiles will be installed. The profile 400 defines the parameters or 
factors against which performance will be judged. Shown here is a representative set 
of factors to be applied to any raw or unweighted exception. These include event 
type, which can be CPU, memory, network, I/O or response time; time of day and day 
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of week factors; and, for those exceptions caused by application malfunction, 
application type. While any number of operational user profiles may be stored by 
server 108, most businesses will require between two and five types. 

10057] In the example shown in FIGURE 7, three user types are given. A user type 1 
might be the average desktop user. Such a user will work during normal business 
hours and use his or her desktop computer for normal office applications, such as 
word processing, email and spreadsheets. User types 2 and 3, on the other hand, are 
meant for users who have a more mission-critical role. These users may be customer 
support representatives, tellers or an order desk. The weighting of the factors in their 
profile reflects the increased importance of the network, response times and their 
appUcations. Further, they work different schedules than the first user type. 

(00581 A particular workstation, node or lAP can be in one of a pluraUty of 
predetermined states of operation, five of which are used in the illustrated 
embodiment: 



State 


Interpretation 


Green 


Completely Operational 


Yellow 


Slightly Degraded 


Orange 


Degraded 


Red 


Not Usable 


Black 


Crashed or Frozen 



[00591 The OU profile defines the thresholds used to move between these states based 

on different factors. Thresholds in the profile may be adjusted to reflect values that 

may be appropriate for different environments. As discussed above, tiie system 

administrator may define different profiles for different types of users. An example of 

the kind of state information contained in an operational user profile is set forth in the 
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table shown in FIGURE 8. A movement into a yellow, orange, red or black (crashed) 
state is declared as an exception by exception capture component 402 (FIGURE 5). 

[0060] The values of the I/O row in the table are preferably expressed as percentages 
of the maximum rate achievable for a particular period of time. For example, if the 
lAP has an I/O rate which is more than 30% of the maximum sustainable I/O rate for 
more than one minute, this may be predetermined to constitute a yellow condition and 
an exception. In one embodiment, the exception capture component 402 concerns 
itself only with writes to and reads from the hard drive or other mass storage device. 
Sustained intensive I/O activity of this kind is an indication of insufficient RAM for 
the number of applications running. Fxuther, a high level of I/O activity will cause 
other operations in the system to slow and will degrade performance of other 
applications using disk activity. 

[00611 hi FIGURE 8, the response time row concerns the time it takes to respond to 
any user input; for example, when a software application presents a button to the user 
for him or her to cUck on, how long does it take for the appUcation to present the next 
window after the user clicks on the button? The "system crash" factor relates to 
applications that are off while they should be on, are hanging or have crashed. 
Finally, in one preferred embodiment, the user may find it usefiil to manually identify 
an lAP which is responding in a degraded fashion. This can be used to corroborate 
other indications of degraded performance, or possibly to identify situations that are 
not caught by other means. 

[0062] One of the salient aspects of the invention is the user-centered approach by 
which the data are gathered and the unportance thereof ascertained. As shown in 
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FIGURES 4 and 5, the performance is monitored and in large part problems are 
assessed at the local level. This direct measurement of the impact of the workstation 
performance on each user is aggregated to a system-wide impact. 

[0063] The single metric for QoS or quality of service index is a result of applying a 
formula with variable weights to the exceptions collected from the lAP. These are 
weighted to permit varying the importance of the different exceptions. Once the 
weighting has been established based on each installation's requirements and goals, a 
QoS metric is produced as a reliable, repeatable index of lAP performance. QoS 
indices for each lAP or workstation can be used to create other QoS indices for 
aggregate work groups, departments, locations, organizations and/or enterprise 
performance. The QoS index permits the comparison of lAPs, or groups of lAPs, to 
each other. Since the QoS index is calculated for each of several times and is 
normalized to remove effects caused by differences in time periods over which 
different events are measured, the index may be compared to indices calculated for 
different lengths of time. Since a QoS metric is also calculated for each application 
individually on each LAP (a calculated QoS is attributed to the lAP in its entirety and 
is also distributed to each of several suspect software ^plications then running on it), 
levels of service by application can be ascertained across lAPs, time periods, and 
groups. 

[00641 A more detailed process flow of flie exception capture and "snapshot" j 
components of the local QoS application 312 is shown in FIGURE 9. After an 
operation metric is conipared with a respective criterion, benchmark or threshold 
stored in the OU profile 400 (FIGURES 4 and 5) an exception and interrupt are 
declared at 600. The exception cs^ture component 402 periodically monitors several 
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functions and conditions of the lAP (see FIGURE 5), and compares the sensed 
performance levels with respective criteria stored in the OU profile 400. The kind 
and severity of the exception is logged at step 602 to local disk 406; the exception can 
be a change in privilege or user states respecting the CPU, the amount of memory in 
use, I/O rates, accessibility to printing, the amount of disk space which remains 
available to user, an application crash, an entire system crash, a delayed response 
time, or ^ change of state regarding the availability of the network in general. The 
occurrence of an exception also prompts, at step 604, the snapshot component 458 of 
the QoS module 312 to take a snapshot of the applications which are presently 
running on the lAP, and the user presence probe 456 determines, through analysis of 
user interface inputs (keyboard, mouse) whether the user is present. This task list 
sn^shot is also transferred to the log 406. 

[00651 Hence, the state (green, yellow, orange, red or black) of the lAP is maintained 
by the QoS module 312 and a log entry records the time and state change each time a 
change occurs. At the same time, a snapshot of activity tasks and their resources is 
taken and stored on the local machine, 

[00661 FIGURE 10 shows the workflow of the larger QoS system. At the end of a 
predetermined period such as an entire day 800, the QoS module 312 reads the log 
and snapshots 406 at step 802, At step 804, the QoS application calculates the QoS 
for the first interval in the record which contains at least one exception C. At decision 
point 806, the QoS monitor asks whether or not more than one task or application is 
being performed by the lAP. In the instance that more than one task is being 
performed, the QoS value derived fi-om the interval in question is distributed among 
suspected applications at step 108 to create several QoS application indices. 
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[0067] In step 808, the record of applications running at the time of the exception is 
inspected, together with the amount of memory and/or other system resoiirce(s) being 
used by each such appUcation. In the illustrated embodiment, the top three memory- 
users are flagged. The QoS index as calculated for the lAP and for each implicated 
application is recorded at step 810 to a record 812. At step 814, if there is a further 
interval containing at least one exception, the procedure retums to step 802, at which 
data concerning the next interval are read and processed. In the illustrated 
embodiment QoS indices for the whole lAP, and for suspect ones of running 
applications, are calculated for each 15-minute interval of time during the day, 
producing for that day an array of QoS indices for the lAP and a matrix of QoS 
indices for each of the most prominent ones of the softw^e applications mnning at 
any point that day. At step 816, the QoS values for the day are uploaded to server 108 
for storage in the database and reporting. 

[0068] As described above, once a raw, untreated exception C has been attributed and 
a value assigned according to its severity, it is weighted by appropriate ones of the 
weight factors Fw read from the operational user profile, in the one embodiment 
taking into account the following factors: kind of event, which can be a service 
interraption or deterioration event; duration of the service interruption or deterioration 
event; presence or absence of the user at the time of the event; hour when the event 
occurred, with the event happening during open business hours being considered to be 
more severe than one occurring when the business is closed; day when the event 
occurred, with a similar emphasis placed on days on which the enterprise is open for 
business; a system application event (more severe) or a user application event (less 
severe); and whether the involved application is critical or noncritical. Other 
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weighting schemes can be employed, using fewer or more factors, different factors, 
and/or different weights. 

(00691 In the illustrated embodiment, a QoS index is calculated for an lAP for each of 
a predetermined number of intervals, which in the illustrated embodiment are each 
fifteen minutes. All sensed exceptions or state changes occurring within the interval 
are aggregated to the QoS index for that period. A QoS index is assigned to every 
state change which occurs on an lAP. The state change is weighted by all applicable 
factors as a function of the user type assigned to the user of the lAP. The sum of all 
of the weighted state changes is divided by the number of minutes for the period for 
which the calculation is made, to arrive at a normalized index which can be compared 
with an index calculated for a different event or events, which may have occurred 
over a shorter or longer period, another application or another lAP. The calculation 
can be represented by the formula: 

n 



[00701 In the above equation, each exception Ci is assigned a base value depending on 
its color state. According to one embodiment, yellow is given a base value of 100, 
orange 200, red 500 and black 2000. A "green" condition is not considered an 
exception (or altematively may be considered to have a base value of zero). 
Fwj....Fwm are the weight factors used to modify each one of the exceptions which 
will vary, according to the OU profile, by user presence, time of day, listed 
application, duration, and other factors. T^in is the interval in minutes for which the 
QoS index is calculated and during which the exception(s) occxxrred, 
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[00711 The QoS index thus derived is a number which represents the relative 
perfonnance of an lAP. The index will not vary as a function of the period over 
which the exceptions and conditions contributing to it were measured, and therefore 
can be averaged or compared with the QoS of other lAPs, or groups of lAPs, having 
QoS indices that were developed over different time periods. Like QoS indices are 
developed for the different software applications according to an algorithm such as 
that illustrated in FIGURE 10. 

[0072] The FIGURES below give examples of the different report types that can be 
generated with the collected information. FIGURE 1 1 shows a QoS by day for the 
month of May for a single lAP. A lower QoS means better performance. 

[00731 FIGURE 12 compares the QoS by day for groups of lAPs in two different 
locations, the examples being given being Chicago and Paris. 

[0074J FIGURE 13 shows a QoS comparison for a group of LAPs by software 
application, with separate quality of service indices being computed in this example 
for Excel®, Outlook®, Access® and Word®. FIGURE 14 is a three-dimensional 
graphic comparing QoS indices for different £q)plications on different dates. 

[0075] As noted above, the QoS module 312 monitors user input by the keyboard or 
mouse. The QoS monitor 312 determines whether a user is present at an lAP by the 
recency of input or control operations. 

[0076] This status is associated with the perfonnance of the lAP by time records 

maintained by the QoS application in which the percentage of attended time is 

calculated for each fifteen minute time slice by which performance records are 

maintained. The attended/imattended status is also recorded in each excqption record 
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snapshot 604 that is generated by an exception to the OU profile variables loaded on 
the lAP. 

(00771 FIGURE 15 shows a report which graphically represents the user presence at 
an lAP. The large, rear peaks are a portion of a graph 1300 of the percentage of time 
during each 15 minute interval in which the user was present at the machine. The 
peaks in front of these show the number of exceptions of yellow (1302), orange 
(1304), and red (1306) varieties. The graph does not show any black exceptions 
occurring. As shown, gr^hics may also be compiled showing, at 1308, relative 
workstation availabihty by QoS states during user presence and, at 1310, an exception 
severity distribution among non-normal states. 

[00781 Each application launched at least once on an lAP generates a performance 
matrix. This matrix records all of the information about the consumption of resources 
by the appUcation in question and calculates response time, memory consumption, 
CPU consumption, amount of network and disk traffic, optionally handle usage and 
possibly other information, such as information of an instantaneous nature (CPU and 
memory) and of an aggregate nature. This information is stored in the matrix and is 
used to establish a baseline of normal operation, made available to the QoS exception 
capture component. Deviations from this baseline by predetermined amoimts are 
considered exceptions and are recorded as exceptions. FIGURE 16 is a graphical 
report showing a comparison between the resource usage for a particular application 
(iexplore.exe) during a production day (1402) and the usage for previous days (1404). 

[00791 Each time a change in status occurs, the present invention, through the 
application matrices, investigates the causes by inspecting and analyzing the list of 
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active tasks. The illustrated embodiment of the present invention may designate up to 
three causes (tasks or applications) which contributed to a particular penalty or 
exception. These causes are aggregated at the central database, and reports may be 
produced that represent the most frequently occurring causes for events. Each 
nmning application has recorded for it several measurements of resource usage. In 
the illustrated embodiment these include CPU cycles, usage time, peak memory, 
aggregated number of page faults, aggregated number of handles and aggregated 
number of threads launched by the application in question. The top three users as 
identified by any of these resource measurements are identified and recorded as 
candidates for contributing to the exceptions, hi an alternative embodiment data for 
page faults, handles and threads are not kept, more reliance on memory usage 
measurements being had instead. 

[0080] These reports may be organized by lAP, groups, locations, lAP types, or other 
groupings. This provides the capability of producing responsibility reports for both 
individuals and groups. 

[0081] At the time that an exception is captured, the snapshot 604 which is taken 
includes its resource utilization. During QoS processing, as shown in FIGURE 17, 
the responsibiUty for each application is derived from the event log and its associated 
snapshot. This provides both application specific QoS indices and information to 
identify the applications responsible for the exceptions. FIGURE 18 is a graph of top 
problem event contributors for an exemplary case. Each of several software 
applications or tasks has a measurement for QoS index (penalty), user's time and 
response time. These metrics can be used by an IT administrator to discern which 
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running software applications are causing the most problems for system perfomiance, 
and what kinds of problems those are. 

[00821 In summary, methods, systems, apparatus and prerecorded media for quality 
of service assessment have been shown and described. Employing a user-centric 
approach, the present invention monitors each of several parameters on individual 
lAPs or workstations, and weigjits exceptions to normal performance according to 
values read from prerecorded operational user profiles. The quality and service 
indices are functions of the severity of the sensed exceptions and whether or not the 
user is present. As compiled, these data provide analytical tools for the IT 
administrator in determining which software applications require attention. 

[0083] While an illustrated embodiment of the present invention has been described 
and illustrated in the appended drawings, the present invention is not limited thereto 
but only by the scope and spirit of the appended claims. 
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